Concurrent NetCore: From Policies to Pipelines 



Cole Schlesinger 

Princeton University 

35 Olden St. 
Princeton, NJ 08540 

cschlesi@cs.princeton.edu 



Michael Greenberg 

Princeton University 

35 Olden St. 
Princeton, NJ 08540 

mgl9@cs.princeton.edu 



David Walker 

Princeton University 

35 Olden St. 
Princeton, NJ 08540 

d pw@cs. pri nceton .ed u 



Abstract 

In a Software-Defined Network (SDN), a central, computationally 
powerful controller manages a set of distributed, computationally 
simple switches. The controller computes a policy describing how 
each switch should route packets and populates packet-processing 
tables on each switch with rules to enact the routing policy. As 
network conditions change, the controller continues to add and 
remove rules from switches to adjust the policy as needed. 

Recently, the SDN landscape has begun to change as several 
proposals for new, reconfigurable switching architectures, such as 
RMT [5] and FlexPipe [14] have emerged. These platforms pro- 
vide switch programmers with many, flexible tables for storing 
packet-processing rules, and they offer programmers control over 
the packet fields that each table can analyze and act on. These 
reconfigurable switch architectures support a richer SDN model 
in which a switch configuration phase precedes the rule popu- 
lation phase [4]. In the configuration phase, the controller sends 
the switch a graph describing the layout and capabilities of the 
packet processing tables it will require during the population phase. 
Armed with this foreknowledge, the switch can allocate its hard- 
ware (or software) resources more efficiently. 

We present a new, typed language, called Concurrent NetCore, 
for specifying routing policies and graphs of packet-processing 
tables. Concurrent NetCore includes features for specifying se- 
quential, conditional and concurrent control-flow between packet- 
processing tables. We develop a fine-grained operational model 
for the language and prove this model coincides with a higher- 
level denotational model when programs are well-typed. We also 
prove several additional properties of well-typed programs, includ- 
ing strong normalization and determinism. To illustrate the utility 
of the language, we develop linguistic models of both the RMT and 
FlexPipe architectures and we give a multi-pass compilation algo- 
rithm that translates graphs and routing policies to the RMT model. 

Categories and Subject Descriptors D.3.2 [Programming Lan- 
guages]: Language Classifications — Specialized application lan- 
guages 

General Terms Design, Languages, Theory 
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1. Introduction 

Over the past several years, a new networking technology known 
as Software-Defined Networking (SDN) has emerged as a viable 
competitor to traditional networking infrastructure. In a software- 
defined network, a logically centralized controller machine (or 
cluster of machines) manages a distributed collection of switches. 
The controller is a general-purpose server whose primary job is to 
decide how to route packets through the network while avoiding 
congestion, managing security, handling failures, monitoring load, 
and informing network operators of problems. The switches, on the 
other hand, are specialized hardware devices with limited compu- 
tational facilities. In general, a switch implements a collection of 
simple rules that match bit patterns in the incoming packets, and 
based on those bit patterns, drop packets, modify their fields, for- 
ward the packets on to other switches, or send the packet to the 
controller for additional, more general analysis and processing. The 
switch itself does not decide what rules to implement — that job lies 
with the controller, which sends messages to the switches to in- 
stall and uninstall the packet-forwarding rules needed to achieve its 
higher-level, network- wide objectives. SDN is distinguished from 
traditional networks by its centralized, programmatic control. In 
contrast, traditional networks rely on distributed algorithms imple- 
mented by the switches, and network administrators manually con- 
figure each switch in the hope of inducing behavior that conforms 
to a global (and often poorly specified) network policy. 

SDN has had a tremendous impact in the networking commu- 
nity, both for industry and academia. Google has adoped SDN 
to manage its internal backbone, which transmits all its intra- 
datacenter traffic — making it one of the largest networks in the 
world [9], and many other major companies are following Google's 
lead. Indeed, the board of the Open Networking Foundation 
(ONF) — the main body responsible for defining SDN standards, 
such as OpenFlow [10] — includes the owners of most of the largest 
networks in the world (Google, Facebook, Microsoft, etc) and its 
corporate membership numbers over a hundred. On the academic 
side, hundreds of participants have attended the newly-formed 
HotSDN workshop, and several tracks of top networking confer- 
ences, such as NSDI and SIGCOMM, are dedicated to research in 
SDN. But at its heart, management of Software-Defined Networks 
is an important new programming problem that calls for a vari- 
ety of new, high-level, declarative, domain- specific programming 
languages, as well as innovation in compiler design and implemen- 
tation. 

OpenFlow 1.0: successes and failures. The OpenFlow protocol 
is a popular protocol for communication between the controller and 
switches. The first version, OpenFlow 1.0 [10], supported a simple 
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abstraction: Each switch is a single table of packet-forwarding 
rules. Each such rule can match on one or more of twelve standard 
packet fields (source MAC, destination MAC, source IP, destination 
IP, VLAN, etc.) and then execute a series of actions, such as 
dropping the packet, modifying a field, or forwarding it out a port. 
A controller can issue commands to install and uninstall rules in 
the table and to query statistics associated with each rule (e.g., the 
number of packets or bytes processed). 

The single table abstraction was chosen for the first version of 
OpenFlow because it was a "least common denominator" inter- 
face that many existing switches could support with little change. 
It worked, and OpenFlow switches from several hardware ven- 
dors, including Broadcom and Intel, hit the market quickly. The 
simplicity of the OpenFlow 1.0 interface also made it a relatively 
easy compilation target for a wave of newly-designed, high-level 
SDN programming languages, such as Frenetic [7], Procera [15], 
Maple [16], FlowLog [13] and others. 

Unfortunately, while the simplicity of the OpenFlow 1.0 inter- 
face is extremely appealing, hardware vendors have been unable to 
devise implementations that make efficient use of switch resources. 
Packet processing hardware in most modern ASICs is not, in fact, 
implemented as a single match-action table, but rather as a collec- 
tion of tables. These tables are often aligned in sequence, so the 
effects of packet processing by one table can be observed by later 
tables, or in parallel, so non-conflicting actions may be executed 
concurrently to reduce packet-processing latency. 

Each table within a switch will typically match on a fixed subset 
of a packet's fields and will be responsible for implementing some 
subset of the chip's overall packet-forwarding functionality. More- 
over, different tables may be implemented using different kinds of 
memory with different properties. For example, some tables might 
be built with SRAM and only capable of exact matches on certain 
fields — that is, comparing fields against a single, concrete bit se- 
quence (eg. 1010001010). Other tables may use TCAM and be ca- 
pable of ternary wildcard matches, where packets are compared to 
a string containing concrete bits and wildcards (e.g. 10?1??1001?) 
and the wildcards match either 0 or 1. TCAM is substantially more 
expensive and power-hungry than SRAM. Hence, TCAM tables 
tend to be smaller than SRAM. For instance, the Broadcom Trident 
has an L2 table with SRAM capable of holding ^100K entries and 
a forwarding table with TCAM capable of holding ^4K entries [6]. 

In addition to building fixed-pipeline ASICs, switch hard- 
ware vendors are also developing more programmable hardware 
pipelines. For example, the RMT design [5] offers a programmable 
parser to extract data from packets in arbitrary application-driven 
ways, and a pipeline of 32 physical match-action tables. Each phys- 
ical table in this pipeline may be configured for use in different 
ways: (1) As a wide table, matching many bits at a time, but con- 
taining fewer rows, (2) as a narrower table, matching fewer bits 
in each packet but containing more rows, (3) as multiple paral- 
lel tables acting concurrently on a packet, or (4) combined with 
other physical tables in sequence to form a single, multi-step log- 
ical table. Intel's FlexPipe architecture [14] also contains a pro- 
grammable front end, but rather than organizing tables in a sequen- 
tial pipeline, FlexPipe contains a collection of parallel tables to 
allow concurrent packet processing, a shorter pipeline and reduced 
packet-processing latency. 

In theory, these multi-table hardware platforms could be pro- 
grammed through the single-table OpenFlow 1.0 interface. How- 
ever, doing so has several disadvantages: 

• The single OpenFlow 1.0 interface serves as a bottleneck in the 
compilation process: Merging rules from separate tables into 
a single table can lead to an explosion in the number of rules 
required to represent the same function as one might represent 
via a set of tables. 
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Figure 1. Architecture of an OpenFlow 2.0 System 



• Once squeezed into a single table, the structure of the rule set 
is lost. Recovering that structure and determining how to split 
rules across tables is a non-trivial task, especially when the rules 
appear dynamically (without advance notice concerning their 
possible structure) at the switch. 

• Newer, more flexible chips such as RMT, FlexPipe or NetFP- 
GAs have a configuration stage, wherein one plans the configu- 
ration of tables and how to allocate different kinds of memory. 
The current OpenFlow protocol does not support configuration- 
time planning. 

Towards OpenFlow 2.0. As a result of the deficiencies of the 
first generation of OpenFlow protocols, a group of researchers have 
begun to define an architecture for the next generation of OpenFlow 
protocols [4] (See Figure 1). In this proposal, switch configuration 
is divided into two phases: table configuration and table population. 

During the table configuration phase, the SDN controller de- 
scribes the abstract set of tables it requires for its high-level routing 
policy. When describing these tables, it specifies the packet fields 
read and written by each table, and the sorts of patterns (either exact 
match or prefix match) that will be used. In addition, the table con- 
figuration describes the topology of the abstract tables — the order 
they appear in sequence (or in parallel) and the conditions neces- 
sary for executing the rules within a table. 

We call the tables communicated from controller to switch ab- 
stract, because they do not necessarily correspond directly to the 
concrete physical tables implemented by the switch hardware. In 
order to bridge the gap between abstract and concrete tables, a com- 
piler will attempt to find a mapping between what is requested by 
the controller and what is present in hardware. In the process of de- 
termining this mapping, the compiler will generate a function capa- 
ble of translating sets of abstract rules (also called an abstract pol- 
icy) supplied by the controller, and targeted at the abstract tables, 
into concrete rules/policy implementable directly on the concrete 
tables available in hardware. After the table configuration phase, 
and during the table population phase, the rule translator is used to 
transform abstract rules into concrete ones. 

The configuration phase happens on a human time scale: a net- 
work administrator writes a policy and a controller program and 
runs the compiler to configure the switches and SDN controllers on 
her network appropriately. Rule population, on the other hand, hap- 
pens on the time scale of network activity: a controller's algorithm 
may install, e.g., new firewall or NAT rules after observing a single 
packet — concrete examples of these and other rule installations can 
be found in Section 2. 

Contributions of this paper. The central contribution of this pa- 
per is the design of a new language for programming OpenFlow 
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2.0 switches. This compiler intermediate language is capable of 
specifying high-level switch policies as well as concrete, low-level 
switch architectures. We call the language Concurrent NetCore (or 
CNC, for short), as it is inspired by past work on NetCore [7, 11] 
and NetKAT [3]. 1 Like NetCore and NetKAT, Concurrent NetCore 
consists of a small number of primitive operations for specifying 
packet processing, plus combinators for constructing more com- 
plex packet processors from simpler ones. Concurrent NetCore in- 
troduces the following new features. 

• Table specifications: Table specifications act as "holes" in an 
otherwise fully-formed switch policy. These tables can be filled 
in {i.e., populated) later. Policies with tables serve as the phase- 
1 configurations in the OpenFlow 2.0 architecture. Ordinary, 
hole-free policies populate those holes later in the switch- 
configuration process. 

• Concurrent composition: Whereas NetCore and NetKAT have 
a form of "parallel composition," which copies a packet and 
performs different actions on different copies, CNC also pro- 
vides a new concurrent composition operator that allows two 
policies to act simultaneously on the same packet. We use con- 
current composition along with other features of CNC to model 
the RMT and Intel FlexPipe packet-processing pipelines. 

• Type System: Unlike past network programming languages, 
CNC is equipped with a simple domain- specific type system. 
These types perform two functions: (1) they determine the kinds 
of policies that may populate a table (which fields may be read 
or written, for instance), and thereby guarantee that well-typed 
policies can be compiled to the targeted table, and (2) they 
prevent interference between concurrently executing policies, 
thereby ensuring that the overall semantics of a CNC program 
is deterministic. 

The key technical results of the paper include the following: 

• Semantics for Concurrent NetCore: We define a small-step op- 
erational semantics for CNC that captures the intricate interac- 
tions between (nested) concurrent and parallel policies. In order 
to properly describe interacting concurrent actions, this seman- 
tics is structured entirely differently from the denotational mod- 
els previously defined for related languages. 

• Metatheory of Concurrent NetCore: The metatheory includes a 
type system and its proof of soundness, as well as several auxil- 
iary properties of the system, such as confluence and normaliza- 
tion of all well-typed policies. We derive reasoning principles 
relating the small-step CNC semantics to a NetKAT-like deno- 
tational model. 

• Multipass compilation algorithm: We show how to compile 
high-level abstract configurations into the constrained lower- 
level concrete configuration of the RMT pipeline [5]. In doing 
so, we show how to produce policy transformation functions 
that will map abstract policy updates into concrete policy up- 
dates. We have proven many of our compilation passes correct 
using reasoning principles derived from our semantics. We of- 
fer this compilation as a proof of concept of "transformations 
within CNC" as a compilation strategy; we believe that many 
of our algorithms and transformations will be reusable when 
targeting other platforms. 

A technical appendix is available that includes a full presentation 
of the compilation algorithm, theorems, and proofs [1]. 

1 Because we focus on programming individual switches in this paper, our 
language does not contain Kleene Star, which is more useful for specifying 
paths across a network than policies on a single switch. Hence, our language 
is a NetCore as opposed to a NetKAT. 




Figure 2. A simple network. 



The following section introduces CNC in greater detail through 
a series of examples, while Section 3 presents a formal semantics 
for CNC, and Section 4 describes its metatheory. The models of 
both the RMT and Intel FlexPipe architectures are described in 
Section 5, followed by our compilation algorithm in Section 6. 
Section 7 describes related work, and we conclude in Section 8. 

2. CNC by example 

In this section, we introduce CNC through a series of examples, 
starting with user policies that define high-level packet process- 
ing, and then showing how CNC can model low-level switching 
hardware. Because CNC can model both ends of the spectrum, it 
can serve as a common intermediate language within an OpenFlow 

2.0 compilation system. Section 6 will illustrate this idea via algo- 
rithms that demonstrate how to transform our high-level user poli- 
cies into components for placement in RMT tables. 

2.1 Simple switch policies 

Consider the picture in Figure 2. This picture presents several 
devices, a switch, a controller, a server and a DPI box, as well 
as a link to "the internet." The switch has four ports (labelled 1, 2, 

3, 4 in the picture) that connect it to the other devices and to the 
internet. Our goal is to write a policy for the switch to specify how 
it forwards packets in and out of its ports. 

In general, we model packets as records with a number of fields 
which map to values drawn from a finite set. Our examples typi- 
cally use an idealized collection of fields such as src (the packet's 
source IP address), in (the port the packet arrives on), and out (the 
port a packet should leave on). Switch policies are functions that 
map packets to sets of packets. For example, a policy that drops all 
packets will map any packet to the empty set of packets. A policy 
that forwards packets from the internet (port 1) to the server (port 
2) will map packets with in field 1 to a packet with out 2. A pol- 
icy that forwards packets from the internet to both the DPI box and 
the server will map packets with in = 1 to a pair of packets with 
out — 2 and out — 3. 

We build our policies out of a collection of primitive opera- 
tions and policy combinators. The simplest primitive filters packets 
based on the contents of a single field. For example, when applied 
to a packet pk, the test src = 10.0.0.1 returns {pk} when pk's 
src field is 10.0.0.1 and returns the empty set of packets otherwise. 
Using such tests as well as standard boolean connectives and (;), 
or (+) and not (->), one can easily build up a function on packets 
that implements a firewall (either dropping each packet or returning 
it unchanged). For example, we might want to implement the fol- 
lowing firewall w on the switch in Figure 2. It admits ssh or http 
traffic on port 1 , but blocks all other traffic arriving on port 1 . All 



1 DPI is deep packet inspection, a form of network security monitoring that 
inspects not just packet headers but their payloads as well. 

3 Using in and out fields to designate ingress and egress ports rather than 
a single port field that indicates where the packet is "right now" deviates 
slightly from past presentations of NetCore and NetKAT, but more faithfully 
models our hardware targets. 
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traffic on ports other than 1 is allowed. 

w — in = 1; (typ — ssh + typ = http) + -i(in — 1) 

In order to make changes to packets, we use the assignment 
primitive / <(— value. Complex policies may perform the actions 
of a set of simpler policies in series using the sequential compo- 
sition operator (pi;p2). Alternatively, a policy may copy a packet 
and perform both pi and p2 on the separate copies, taking the union 
of their results (pi + pz). We have reused the symbols ";" and "+" 
(conjunction and disjunction) here as it turns out their semantics as 
logical predicates coincides with their semantics as policy combi- 
nators (the boolean algebra is a sub-algebra of the policy algebra). 

As an example, to define a static routing policy r for our switch, 
we might write the following policy. 

r = in = 1; out <(— 2 + in = 2; out <(— 1 

The policy above has the effect of routing packets from port 1 to 
port 2 and from port 2 to port 1 . In more detail, it first copies the 
incoming packet (+). Then, in the first branch, it tests whether the 
input port is 1 ; if not, the packet is dropped; if so, the out field is 
assigned 2. The second branch is dual, forwarding packets from 
port 2 to port 1. The guards on each branch guarantee that the 
'copying' is purely notional; in general, one codes the conditional 
statement if a then pi else p2 as a;pi + ^a;p2. 

The above features are not new — they are present in Net- 
Core [11, 12] and NetKAT [3]. However, in order to serve as a 
configuration language for OpenFlow 2.0, we require a couple of 
additional features, as well as the development of a simple type 
system for policies. First, the policies so far are completely static. 
They offer no room for populating new packet-processing rules at 
run time. To admit this kind of dynamic extension of static poli- 
cies, we add typed table variables, which we write (x : t). For 
example, we write (x : ({typ,src} , {out})) to indicate that the 
controller may later install new rules in place of x, and any such 
rules will only read from the typ and src header fields and write 
to the out field. The controller could use this table to dynamically 
install rules that forward selected subsets of packets to the DPI box 
for additional scrutiny. The typing information informs the switch 
of the kind of memory it needs to reserve for the table x (in this 
case, memory wide enough to be able to hold patterns capable of 
matching on both the typ and src fields). We model rule population 
as a set of table bindings b, i.e., a closing substitution. 

A second key extension is concurrency, written pi \ \ p2. In 
order to reduce packet-processing latency within a switch, one may 
which to execute p\ and p2 concurrently on the same packet (rather 
than making copies). The latter is only legal provided there is no 
interference between subpolicies pi and p2. In CNC, interference 
is prevented through the use of a simple type system. This type 
system prevents concurrent writes and ensures determinism of the 
overall packet-processing policy language. 

As an example, consider the following policy p, which assem- 
bles each of the components described earlier. This policy checks 
for compliance with the firewall w while concurrently implement- 
ing a routing policy. The routing policy statically routes all packets 
to the server (this is the role of r) while dynamically selecting those 
packets to send to the DPI box (this is the role of x). 

m = (x : ({typ, src} , {out})) 

p — w 1 1 (r + m) 

In essence, we have a form of speculative execution here. The 
policy r + m is speculatively copying the packet and modifying it's 
out field while the firewall decides whether to drop it. If the firewall 
ultimately decides to drop the packet, then the results of routing 
and monitoring are thrown away. If the firewall allows the packet, 
then we have already computed how many copies of the packet are 



going out which ports. This kind of speculative execution is safe 
and deterministic when policies are well-typed. 

2.2 Modeling programmable hardware architectures 

In addition to providing network administrators with a language for 
defining policies, our language of network policies aptly describes 
the hardware layout of switches' packet-processing pipelines. In 
this guise, table variables represent TCAM or SRAM tables, and 
combinators describe how these hardware tables are connected. 
The key benefit to devising a shared language for describing both 
user-level programs and hardware configurations is that we can de- 
fine compilation as a semantics-preserving policy translation prob- 
lem, and compiler correctness as a simple theorem about equiva- 
lence of input and output policies defined in a common language. 
Below, we demonstrate how to model key elements of the RMT [5] 
and FlexPipe [14] architectures. Both chips offer differently archi- 
tectured fixed pipelines connecting reconfigurable tables. 

RMT. In RMT (as well as in FlexPipe), multicast is treated spe- 
cially: the act of copying and buffering multiple packets during a 
multicast while processing packets as quickly as they come in ("at 
line rate") is the most difficult element of chip design. 

The RMT multicast stage consists of a set of queues, one per 
output port. Earlier tables in the pipeline indicate the ports on which 
a packet should be multicast by setting bits in a metavariable bitmap 
we call outi. The multicast stage consists of a sum, where each 
summand corresponds to a queue on a particular output port — when 
the ith out bit is set, the summand tags the packet with a unique 
identifier and sets its output port out to i accordingly. 

multicast = (outi = l;/t a g ^— ^i;out <(— 1) 
+ (out 2 = 2; /tag <- v 2 ;out <- 2) 
+ ... 

In addition to the multicast processor, the RMT architecture pro- 
vides thirty-two physical tables, which may be divided into se- 
quences in the ingress and egress pipelines. Overall, the RMT 
pipeline consists of the ingress pipeline, followed by the multicast 
stage, followed by the egress pipeline. 

pipeline = (xi : n); . . . ; (xk : Tfc); 

multicast; 

(xfc+i : Tfc+i); . . . ; (x 3 2 : T32) 

FlexPipe. The FlexPipe architecture makes use of concurrency by 

arranging its pipeline into a diamond shape. Each point of the di- 
amond is built from two tables in sequence, with incoming pack- 
ets first processed by the first pair, then concurrently by the next 
two pairs, and finally by the last pair. This built-in concurrency 
optimizes for common networking tasks, such as checking pack- 
ets against an access control list while simultaneously calculating 
routing behavior. 

pair^ = (xi,i : r i; i); {pa, 2 : t;, 2 ) 
diamond = pairi; (pair 2 1 1 pair 3 ); pair4 

The FlexPipe multicast stage occurs after the diamond pipeline 
and, like the RMT multicast stage, relies on metadata set in the 
ingress pipeline to determine multicast. FlexPipe can make up to 
five copies ("mirrors") of the packet that can be independently 
modified, but each copy can be copied again to any output port, 
so long as no further modifications are required. 

multicast = mirror; egress; flood 
pipeline = diamond; multicast 

We present models of both RMT and FlexPipe (including mirror, 
egress and flood) in greater detail in Section 5. 
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Figure 3. Packets, types, and predicate/policy syntax 



3. Concurrent NetCore 

We define the syntax of Concurrent NetCore in Figure 3. The lan- 
guage is broken into two levels: predicates and policies. Predicates, 
written with the metavariables a and b, simply filter packets with- 
out modifying or copying them. Policies, written with the metavari- 
ables p and q, can (concurrently) modify and duplicate packets. Ev- 
ery predicate is a policy — a read-only one. Both policies and pred- 
icates are interpreted using a set semantics, much like NetKAT [3]. 
Policies are interpreted as functions from sets of packets to sets 
of packets, while predicates have two interpretations: as functions 
from sets of packets to sets of packets, but also as Boolean propo- 
sitions selecting a subset of packets. A packet, written with the 
metavariable pk, is finite partial function from fields to values. We 
fix a set of fields F, from which we draw individual fields /. We 
will occasionally refer to sets of fields using the metavariables R 
and W when they denote sets of readable or writable fields, respec- 
tively. We do not have a concrete treatment for values v G Val, 
though Val must be finite and support a straightforward notion of 
equality. One could model both equality and TC AM- style wildcard 
matching, but for simplicity's sake, we stick with equality only. 

As explained in Section 2, the policies of Concurrent NetCore 
include the predicates as well as primitives for field modification, 
tables (x : r), sequential composition (;), parallel composition (+), 
and concurrency (||). One difference from our informal presenta- 
tion earlier is that concurrent composition p w p \\w q q formally re- 
quires a pair of write sets W p and W q where W p denotes the set of 
fields that p may write and W q denotes the set of fields that q may 
write. Our operational semantics in Section 3.1 will in fact get stuck 
if p and q have a race condition, e.g. , have read/write dependencies. 

Table variables (x : r) are holes in a policy to be filled in by 
the controller with an initial policy, which the controller updates 
as the switch processes packets. The type r = (R, W) constrains 
the fields that the table may read from (R) and write to (W). For 
example, the rules that populate the table (x : ({src, typ} , {dst})) 
can only ever read from the src and typ fields and can only ever 
write to the dst fields. In practice, this means that the controller 
can substitute in for x any policy matching its type (or with a more 
restrictive type). 



A note on packet field dependences. Packet formats often have 
complex dependencies, e.g., if the Ethertype field is 0x800, then 
the Ethernet header is followed by an IP protocol header. Switches 
handle attempts to match or modify a missing field at run time, 
although the specific behavior varies by target architecture. In the 
RMT chip, for instance, there is a valid bit indicating the presence 
(or absence) of each possible field. In OpenFlow 1.0 architectures, 
matching against a missing field always succeeds. In both cases, 
writing to a missing field is treated as a non-operation. Hence, 
we assume that each packet arriving at each switch contains fields 
fi , . . . , fk, although in practice the value associated with each field 
(which we treat abstractly) may be a distinguished "not present" 
value. 

3.1 Small-step operational semantics 

We give a small-step semantics for closed policies, i.e., policies 
where table variables have been instantiated with concrete policies. 

Just like the switches we are modeling, our policies actually 
work on packets one at a time: switches take an input packet and 
produce a (possibly empty) set of (potentially modified) output 
packets. As a technical convenience, our operational semantics 
generalizes this, modeling policies as taking a set of packets to a set 
of packets. Making this theoretically expedient choice — as we will 
show in Lemma 3 — doesn't compromise our model's adequacy. 

While other variants of NetCore/NetKAT use a denotational 
semantics, we use a completely new small-step, operational se- 
mantics in order to capture the interleaving s of concurrent reads 
and writes of various fields of a packet. The interaction between 
(nested) concurrent processing of shared fields and packet-copying 
parallelism is quite intricate and hence deserves a faithful, fine- 
grained operational model. In Section 4, we define a type system 
that guarantees the strong normalization of all concurrent execu- 
tions, and show that despite the concurrency, we can in fact use a 
NetKAT-esque set-theoretic denotational semantics to reason about 
policies at a higher level of abstraction if we so choose. 

Using PK to range over sets of packets, we define the states 
a for the small-step operational semantics a — >• a' in Figure 3. 
These states a = (p,S) are pairs of a policy p and a packet tree S. 
Packet trees represent the state of packet processing: which packets, 
or packet components, are the different branches of the parallel and 
concurrent compositions working on? When processing a negation, 
from what set of packets will we take the complement? 

The leaves of packet trees are of the form (PK, W), where PK 
is a set of packets and W is a set of fields indicating the current 
write permission. The write permission indicates which fields may 
be written; other fields present in the packets pk G PK may be read 
but not written. Packet processing is done when we reach a terminal 
state, (id, (PK,W)). 

There are three kinds of packet tree branches. The packet tree 
branch (par Si S2) represents a parallel composition p + q where p 
is operating on Si and q is operating on S2. The packet tree branch 
(notpK S) represents a negation where a is running on S — 
when a terminates with some set of packets PK', we will compute 
PK\PK', i.e., those packets not satisfying a. The packet tree branch 
(conw Si S2) represents a concurrent composition p w p \\w q Q 
where p works on Si with write permission W p and q works on S2 
with write permission \N q . We also store W, the write permission 
before concurrent processing, so we can restore it when p and q are 
done processing. 

We write a — ► a to mean that the state a performs a step of 
packet processing and transitions to the state a' . Packet processing 
modifies the packets in a state and/or reduces the term. The step 
relation relies on several auxiliary operators on packets and packet 
sets. We read pk[f := v] as, "update packet ph's f field with the 
value v" and pk \ F as, "packet pk without the fields in F;" and 
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PK \ F as, "those packets in PK without the fields in F," which 
lifts pk \ F to sets of packets. Finally, we pronounce x as "cross 
product." Notice that PK \ F only produces the empty set when PK 
is itself empty — if every packet pk £ PK has only fields in F, then 
PK\F = {±}, the set containing the empty packet. Such a packet 
set is not entirely trivial, as there remains one policy decision to 
be made about such a set of packets: drop (using drop) or forward 
(using id)? On the other hand, 0 x PK = PK x 0 = 0. 

With these definitions to hand, we define the step relation in Fig- 
ure 4. The following invariants of evaluation and well- typed poli- 
cies may be of use while reading through Figure 4 the following. 

• Policy evaluation begins with a leaf (PK, W) and ends with a 
leaf (PK ; , W) with the same write permissions W. 

• Policies may modify the values of existing fields within packets, 
but they cannot introduce new packets nor new fields — policies 
given the empty set of packets produce the empty set of packets. 

The first few rules are straightforward. The (Drop) rule drops all its 
input packets, yielding 0. In (Match), a match (/ = v, (PK, W)) 
filters PK, producing those packets which have / set to v. In 
(Modify), a modification (f <(— v, (PK, W)) updates packets with 
the new value v. Both (Match) and (Modify) can get stuck: the 
former if / is not defined for some packet, and the latter if the 
necessary write permission (f £ W) is missing. 

Sequential processing for p; q is simpler: we run p to completion 
(SeqL), and then we run q on the resulting packets (SeqR). A 
special packet tree branch is not necessary, because q runs on 
any and all output that p produces. Intuitively, this is the correct 
behavior with regard to drop: if p drops all packets, then q will run 
on no packets, and will therefore produce no packets. 

The parallel composition p + q is processed on (PK,W) in 
stages, like all of the remaining rules. First, (ParEnter) introduces 
new packet tree branch, (par (PK, W) (PK, W)), duplicating the 
original packets: one copy for p and one for q. ParL and ParR 
step p and q in parallel, each modifying its local packet tree. When 
both p and q reach a terminal state, ParExit takes the union of 
their results. Note that ParExit produces the identity policy, id, 
in addition to combining the results of executing p and q, and we 
restore the initial write permissions W. As with NetKAT, p + q has 
a set semantics, rather than bag semantics. If p and q produce an 
identical packet pk, only one copy of pk will appear in the result. 

Negation ^a, like parallel composition, uses a special packet 
tree branch (not) — in this case, to keep a copy of the original 
packets. Running on PK, we first save a copy of PK in the 
packet tree (notpK (PK,W)) (NotEnter), preserving the write 
permissions. We then run a on the copied packets (NotInner). 
When a finishes with some PK a , we look back at our original 
packets and return the saved packets not in PK a (NotExit). 

Concurrent composition is the most complicated of all our poli- 
cies. To run the concurrent composition p w p | |w q q on packets PK 
with write permissions W, we first construct an appropriate packet 
tree (ConEnter). We split the packets based on two sets of fields: 
those written by p, W p , and those written by q, \N q . We also store 
the original write permissions W — a technicality necessary for the 
metatheory, since in well typed programs W = W p U \N q (see 
(Con) in the typing rules in Figure 5, Section 4). The sub-policies 
p and q run on restricted views of PK, where each side can (a) read 
and write its own fields, and (b) read fields not written by the other. 
To achieve (a), we split W between the two. To achieve (b), we re- 
move certain fields from each side: the sub-policy p will process 
PK \ W g under its own write permission W p (ConL), while the 
sub-policy q will process PK \ W p under its own write permission 
W g (ConR). Note that it is possible to write bad sets of fields for 
W p and Wq in three ways: by overlapping, with W p and \N q shar- 
ing fields (stuck in (ConEnter)); by dishonesty, where p tries to 



write to a field not in W p (stuck later in (Modify)); and by mis- 
take, with p reading from a field in \N q (stuck later in (Match)). 
While evaluation derivations of such erroneous programs will get 
stuck, our type system rules out such programs (Lemma 1). When 
both sides have terminated, we have sets of packets PKp and PK g , 
the result of p and q processing fragments of packets and concur- 
rently writing to separate fields. We must then reconstruct a set of 
complete packets from these fragments. In (ConExit), the cross 
product operator x merges the writes from PK P and PK g . We take 
every possible pair of packets pk p and pk q from PK P and PK g and 
construct a packet with fields derived from those two packets. (It 
is this behavior that leads us to call it the 'cross product'.) In the 
merged packet pk, there are three ways to include a field: 

1. We set pk.f to be pk p .f when / 0 Dom {pk ) . That is, / is in 
W p and may have been written by p. 

2. We set pk.f to be pk q .f when / 0 Dom {pk p ). Here, / £ W g , 
and q may have written to it. 

3. We set pk.f to pk .f, which is equal to pk .f. For a / to be 
found in both packets, it must be that / 0 W p U W g — that is, / 
was not written at all. 

This accounts for each field in the new packet pk, but do we have 
the right number of packets? If p ran a parallel composition, it may 
have duplicated packets; if q ran drop, it may have no packets at all. 
One guiding intuition is that well typed concurrent compositions 
p\\q should be equivalent to p; q and q; p. (In fact, all interleaving s 
of well typed concurrent compositions should be equivalent, but 
sequential composition already gives us a semantics for the 'one 
side first' strategy.) The metatheory in Section 4 is the ultimate 
argument, but we can give some intuition by example: 

• Suppose that PK = {pk} and that p = fi <— vi and 
q = /2 ^— V2 update separate fields. In this case PK P = 
{{pk \ {/ 2 })[/i := t>i]} and PK, = {(pk \ {/i})[/ 2 := t*]}. 
Taking PK P x PK q yields a set containing a single packet pk' , 
where pk'(fi) = v\ mdpk'^) = V2, but pk' (f) = pk{f) for 
all other — just as if we ran p; q or q; p. 

• Suppose thatp = id and q — drop. When we take PK P x PK g , 
there are no packets at all in PK g , and so there is no output. 
This is equivalent to running id; drop or drop; id. 

• Suppose that p — fi <— vi + fi <(— v[ and q = fi <(— v^. 
Running p \{f 2 y q on PK will yield 

pk e PK \ {/ 2 }} u 

pk € PK \ {/ 2 }} 

P ke PK\{M} 

, p ■ i i Vy — i j 2 .— , , j! := vi] I pk £ PK} U 

pk € PK} 



PK P 


= {pk 


[h 


:= vi 




{pk 
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i - 

:= v 1 


PK g 


— {pk 
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:= V2 


PK g 


— {pk 


h 


:= V2 




{pk 




:= V2 



Which is the same as running p; q or q; p. 

We should note that p w p | |w q Q is not the same as p; q when W p 
and W g are incorrect, e.g., when p tries to write a field / 0 W p , or 
when q tries to read a field / £ W p . Sequential composition may 
succeed where concurrent composition gets stuck! 

3.2 Modeling the SDN controller 

The operational semantics is defined on closed policies — that is, 
policies without table variables. At configuration time, the con- 
troller installs a (possibly open) policy on each switch, which tells 
the switch how to arrange its packet processing pipeline. Next, at 
population time, the controller will send messages to the switch 
instructing it to replace each abstract table variable with a con- 
crete (closed) policy, after which packet processing proceeds as de- 
scribed by the operational semantics from Figure 4. 
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Packet operations 



pk[f := v] = Xf > . h f-f. pk \ F = xf j± /6F PK\F = {pfc\F| P fcePK} 

I pk (j ) otherwise I pk (j ) otherwise 

pk 1 (f) when / 0 Dom (pk 2 ) 
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ParEnter ~~~ ^ - ~~ 7 — - ParL 

(p + g,(PK,W» (p + g,(par(PK,W) (PK,W»> (p + q, (par <5 P <5 g » {p f + g, (par ^ S q )) 

q, 5 q ) -> (q, 8' ) 
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ConEnter 

(pw p ||w g g,(PK,W» (p Wp ||w q q, (con w (PK\W g ,W p > (PK\W p ,W g »> 
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Figure 4. Concurrent NetCore operational semantics 



Definition 1. Population-time updates and closing functions. 

Popluation-time updates b £ Var — ^ Policy 

Closing functions Tb £ Policy — » Policy 

We model population-time updates as partial functions mapping 
table variables to closed policies. The function Tb (p) structurally 
recurses through a policy p, replacing each table variable x with 
b(x). That is, the policy p is a configuration-time specification, and 
Tb (p) is an instance of that specification populated according to the 
update function b. Population-time updates and closing functions 
will play a large role in Section 6, when we present a compilation 
algorithm for transforming a policy (and subsequent updates) to fit 
on a fixed target architecture. 

4. Metatheory 

The operational semantics of Section 3.1/Figure 4 defines the be- 
havior of policies on packets. A number of things can cause the 
operational semantics to get stuck, which is how we model errors: 

1 . Unsubstituted variables — they have no corresponding rule. 

2. Reads of non-existent fields — (Match) can't apply if there are 
packets pk £ PK such that / 0 Dom (pk), as might happen if 
ConEnter were to split packets incorrectly. 



3. Writes to fields without write permission — (Modify) only al- 
lows writes to a field / if / £ W. 

4. Race conditions — concurrency splits the packet tree based 
on the write permissions of its subpolicies, and incorrect an- 
notations can lead to stuckness via being unable to apply 
(ConEnter), which requires that W p D W g = 0, or via get- 
ting stuck on (2) or (3) later in the evaluation due to the reduced 
fields and permissions each concurrent sub-policy runs with. 

We define a type system in Figure 5, with the aim that well typed 
programs won't get stuck — a property we show in our proof of nor- 
malization, Lemma 1 . First, we define entirely standard typing con- 
texts, r. We will only run policies typed in the empty environment, 
i.e., with all of their tables filled in. Before offering typing rules for 
policies, we define well formedness of types and typing of packet 
sets. A type r = (R, W) is well formed if R and W are subsets of a 
globally fixed set of fields F and if R D W is empty. A set of packets 
PK conforms to a type r = (R, W) if every packet pk £ PK has 
at least those fields in R U W. 

The policies id and drop can both be typed at any well formed 
type, by (Id) and (Drop), respectively. Table variables (x : r) 
are typed at their annotations, r. The matching policy / = v is 
well typed at r when / is readable or writable (Match). Similarly, 
/ <(— v is well typed at r when / is writable in r (Modify). 
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r ::= • | r, (x : r) 
(Ri,Wi)U(R 2 ,W 2 ) = ((Ri \W 2 )U(R 2 \Wi),Wi UW 2 ) 
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R p n w g 
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r h- p Wp ||w q g : ((Rp, w p ) u (R g ,w g )) 



Con 



h r = (R, W) and h PK : r and • h p:r 
then (p, (PK, W)) ^* (id, (PK', W)) such that 

1. h PK' : r, and 

2. PK'\WCPK\ W. 

Proof. By induction on the policy p, leaving r general. The only 
difficulty is showing that (ConExit) can always successfully merge 
the results of well typed concurrency, which can be seen by a 
careful analysis of the cross product, using part (2) of the IH to 
show that fields not in the write permission W are "read-only". □ 

Next we show that our calculus is confluent — even for ill typed 
terms. This result may be surprising at first, but observe that concur- 
rency is the only potential hitch for confluence. A concurrent com- 
position with an annotation that conflicts with the reads and writes 
of its sub-policies will get stuck before ever running (ConExit). 
Even ill typed programs will be confluent — they just might not be 
confluent at terminal states. We can imagine an alternative seman- 
tics, where concurrency really worked on shared state — in that for- 
mulation, only well typed programs would be confluent. 



Lemma 2 (Confluence). If a — > 

exists a' such that o\ — ?►* a' and cr 2 



a\ and a 



a . 



Proof By induction on the derivation of a 
(stronger) single- step diamond property first. 



cr 2 then there 



<7i, proving the 
□ 



Figure 5. Concurrent NetCore typing rules 



Negations are well typed at r = (R, W) by (Not) when a is 
well typed at the read-only version of r, i.e., (R, 0). We restrict the 
type to being read-only to reflect the fact that (a) only predicates 
can be negated, and (b) predicates never modify fields. 

If p is well typed at n and q is well typed at r 2 , then their 
parallel composition p + q is well typed at n U r 2 . Union on types 
is defined in Figure 4 as taking the highest privileges possible: the 
writable fields of n U r 2 are those that were writable in either n 
or t 2 ; the readable fields of the union are those fields that were 
readable in one or both types but weren't writable in either type. 
We give their sequential composition the same type. 

Concurrent composition has the most complicated type — we 
must add (conservative) conditions to prevent races. Suppose V h 
p : (Rp, Wp) and T h q : (R g , W g ). We require that: 

• There are no write-write dependencies between p and q (Wp D 
W g = 0; a requirement of (ConEnter). 

• There are no read- write or write-read dependencies between p 
and q (W p D R q = 0 and R p D W g = 0). This guarantees that 
(Match) won't get stuck trying to read a field that isn't present. 

If these conditions hold, then we say the concurrent composition is 
well typed: r h p Wp \ \ w q q • (Rp, W p ) U (R g , W g ). Note that this 
means that the W stored in the con packet tree will be W p U W g , 
and well typed programs meet the W p U W q C W requirement 
of (ConEnter) exactly. These conditions are conservative — some 
concurrent compositions with overlapping reads and writes are 
race-free. We use this condition for a simple reason: switches make 
similar disjointness restrictions on concurrent tables. 

Two metatheorems yield a strong result about our calculus: 
strong normalization. We first prove well typed policies are nor- 
malizing when run on well typed leaves (PK, W) — they reduce to 
the terminal state (id, (PK ; , W)) with some other, well typed set of 
packets PK ; and the same write permissions W. 

Lemma 1 (Normalization). If 



Normalization and confluence yield strong normalization. Even 
though our small- step operational semantics is nondeterministic, 
well typed policies terminate deterministically. We can in fact do 
one better: our small-step semantics (without concurrency) co- 
incides exactly with the denotational semantics of NetKAT [3], 
though we (a) do away with histories, and (b) make the quantifi- 
cation in the definition of sequencing explicit. Since our policies 
are 'switch-local', we omit Kleene star. 

Lemma 3 (Adequacy). If - h p : r = (R, W) with no concurrency, 
then for all packets h PK : r, if (jp, (PK, W)) ^* (id, (PK X , W)) 
then PK ; = \J vkGPK fe>J pk, where: 



[id 



pk — 



[drop] pk 

If v] pk 

If <- vj pk 
[->a] pk 
lp + qj pk 



PK V(PK) 

{pk} 
0 

\{pk} pk(f) = v 

1 0 otherwise 

{pk[f := v]} 
{pk} \ ([a] pk) 
[p] pk U l_qj pk 

Jpk'eMpk M Pk 



Proof. By induction on • h p : r. 



□ 



The set-based reasoning principles offered by the denotational 
semantics are quite powerful. We can in fact characterize the be- 
havior of well typed concurrent compositions as: 



P w v \\w Q q 



p;q 

q;p 



(Lemma 5) 
(Lemma 4) 



Lemma 4 (Concurrency commutes). 7/h PK : r then 



h p 
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w, 



v 



Wq q • T and (p w 
w p P ' r and (q w 



w 



q,PK) ^* (id,PK ; ) 
p p,PK) ^* (id,PK'> 



Proof. We reorder the congruence steps so that whenever we use 
ConL in one derivation, we use ConR in the other, and vice versa. 
Confluence (Lemma 2) proves the end results equal. □ 
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Lemma 5 (Concurrency serializes). If h p w p ||wg^ : (R,W) 
andh PK : r then (p Wp ||w g g, (PK, W» ^* (id, (PK', W)) *J 
(p;g,(PK,W»^* (id,(PK',W». 

Proof. Rewriting derivations by confluence (Lemma 2) to run p us- 
ing (ConL/SeqL) and then q (nesting in ConR under concurrency). 
We rely on auxiliary lemmas relating, for all p, p's behavior on 
PK \ \N q and on PK (when R p n W g = W p n W q = 0). □ 

5. Modeling RMT and FlexPipe architectures 

In addition to serving programmers at a user level, our language 
of network policies can model the hardware layout of a switch's 
packet-processing pipeline. When we interpret Concurrent NetCore 
policies as pipelines, table variables represent TCAM or SRAM 
tables, and combinators describe how tables are connected. 

Figure 6 presents models for the RMT and FlexPipe architec- 
tures at a finer level of detail than in Section 2.2. Both the RMT 
and FlexPipe architectures share some physical characteristics, in- 
cluding the physical layout of hardware tables. These physical ta- 
bles are built from SRAM or TCAM memory and hold rules that 
match on packet header fields and, depending on the results of the 
match, modify the packet header. Each table has a fixed amount 
of memory, but it can be reconfigured, in the same way the height 
and width of a rectangle can vary as the area remains constant. The 
width of a table is determined by the number of bits it matches on 
from the packet header, and the height determines the number of 
rules it can hold. Hence, knowing in advance that the controller 
will only ever install rules that match on the src is valuable in- 
formation, as it allows more rules to be installed. Although both 
chips support complex operations — such as adding and removing 
fields, arithmetic, checksums, and field encryption — we only model 
rewriting the value of header fields. 

Physical tables are so-called match/action tables: the table com- 
prises an ordered list of rules matching some fields on the header of 
a packet. The table selects a matching rule and executes its corre- 
sponding action. We model physical tables in the pipeline as table 
variables, so we must be careful that our compiler only substitutes 
in policies that look like rules in a match/action table. In an imple- 
mentation of a compiler from Concurrent NetCore to a switch, we 
would have to actually translate the rule-like policies to the switch- 
specific rule population instructions. In our model and the proofs 
of correctness, we treat policies of the form 

matches; crossbar; actions 

as rules (the translation to syntactically correct OpenFlow rules is 
straightforward enough at this point). The matches policy matches 
some fields and selects actions to perform; the crossbar policy col- 
lects the actions selected, and then the actions policy runs them. 
(We elaborate on these phases below.) We believe that this is an ad- 
equate model, since it would not be hard to translate CNC policies 
in this form to rules for a particular switch. Our model requires that 
run-time updates to physical tables be of the form above; i.e., the 
binding b(x : r) (Definition 1) has a rule-like tripartite structure. 

If statements. Before examining each physical table stage in 
detail, it is worth noting that the multicast combinator also serves 
as a form of disjunction. For example, consider the policy a; p + 
^a; q. The packet splits into two copies, but the predicates on 
the left- and right-hand sides of + are disjoint — at least one copy 
will always be dropped. Hence, this particular form never actually 
produces multiple packet copies. It is useful to know syntactically 
that no multicast happens — as we will see, it turns out that physical 
table stages contain sequences of nested if statements. We write 
(if a then p else q) for (a; p + ^a; q). 



Physical tables. Each variable mapped to a physical table by the 
binding b(x : r) comprises three stages. The match stage is first. A 
single match (match;) sets the metadata field act; based on a subset 
of fields drawn from the packet header. These fields implicitly 
determine the width of the match. The metadata field actj holds 
an action identifier Ajk, a stand-in for the slightly more structured 
action languages of the RMT and FlexPipe chips. By convention, 
the j index of action identifiers groups updates to the same fields. 
For example, An . . . A\k might correspond to updating the src 
field, A21 . . . A2j the dst field, and so on. By construction, action 
selection is written to a metadata field act; that is unique to that 
match, allowing for the match stage to execute multiple matches 
concurrently. Once the act; fields are set, the physical table has a 
crossbar that combines the metadata fields and selects the actions 
to execute — which we model with metadata fields doA jk , one for 
each Ajk. Each field doA jk is consumed by an action stage, which 
runs the corresponding actions on the packet. Each action/ stage 
tests for actions denoting updates to field /, which allows actions 
to execute concurrently. 

As an example, suppose we would like to compile the routing 
and firewall policies (r \\w) from Figure 2 as a single physical 
table. 



r 
w 



in — 1; out «— 2 + in = 2; out ^— 1 

in — 1; (typ — ssh + typ — http) + — 1 (in — 1) 



First, let's fix four concrete action values — we'll say that a value 
of 11 means "modify the out field to 1" (out <(— 1); a value of 
12 means "modify the out field to 2" (out <(— 2); a value of 31 
means "do nothing" (id); and a value of 41 means "drop the packet" 
(drop). We begin by defining two concurrent match stages, one 
each for r and w. 



matchr — 



match , r = 



<- 11 



if in = 1 then act r ^— 12 
else if in = 2 then act 
else act r ^— 41 
if in = 1; typ = ssh then act 
else if in = 1; typ = http the 
else if in = 1 then act 
else act™ <(— 31 



<- 31 



http then act™ ^— 31 



<- 41 



matches — match r match 



>w 



The match r construct mirrors the structure of r, but rather than 
directly modifying the out field directly, it assigns an action iden- 
tifier to the actr metadata field. Encoding w is slightly more com- 
plex, thanks to the presence of disjunction (+) and negation. But 
it follows a similar pattern: In addition to converting it; to a se- 
quence of nested if statements, match™ assigns an action identifier 
to the act™ metadata field in place of taking an action directly. The 
matches stage is made up of match r and match™ composed con- 
currently. 

The crossbar stage collects the action values assigned in the 
matches stage in order to communicate them to the actions stage, 
where modifications to the packet header fields occur. 

crossbar — ifact r — 11 + act™ — llthendon ^— 1 



else ifactr 
else ifactr 
else ifactr 
else drop 



12 + act™ 
31 + act™ 
41 + act™ 



12 then doi 2 <- 1 
31 then do3i ^— 1 
41 then do4i <- 1 



The actions stage consumes the output of the crossbar in order 
to effect modifications to the header fields. Actions on the same 
field are grouped; in this case, modifications to the out field are 
handled by action ou t- This allows each action group to be executed 
concurrently, because they operate on different fields by construc- 
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match, 



Physical tables 

if fn = vu; . . . ; fin = vin 
then acti <— Aj± 

else if /21 = V21 ; . • • ; /2m = ^2m 
then act^ ^— ^-fc2 
else . . . 

matches = matchi || match2 || ... 

crossbar = if acti = An + act2 = An + . . . 

then doA lx <— 1 

else if acti = A21 + act2 = A21 + • • • 
then doA 2 \ ^~ 1 
else . . . 

action j = \f 6oA jl = 1 then perform A^i's writes 

else if doAj 2 — 1 then perform A/2's writes 



actions 
physical 
T\y (physical) 



actioni 1 1 action2 1 1 ... 

x : r 

matches; crossbar; actions 



multicast = 

+ 
+ 

pipeline = 



RMT model 

(outi = l;/ta g «— ; out «— 1) 
(ont2 = 2; /tag «— ^2; out «— 2) 



physical; . . . ; physical^; multicast; 
physical fc+1 ; . . . ; physical 32 

where k < 16 



mirror 
egress 



flood 
pair 
diamond 
pipeline 



FlexPipe model 

m = 0 + mi = l;m <— i 

if fn = vn; . . . ; /in = vin 
then/^ <-v' n ;...;/{„ <-„' ln 
else if /21 = V21 ; . . . ; /2m = V2 m 
then/^ ^ v^;...;/^ n <- 
else . . . 

outi = 1; out ^— i 

physical; physical 2 

pairi; (pair 2 || pair 3 ); pair 4 

diamond; mirror; egress; flood 



in 



Figure 6. Modeling RMT and Intel FlexPipe. 



tion. 



actioriout = 

actioriid = 

actioridrop = 
actions 



if do 1 1 = 1 then out <(— 1 
else if doi2 = 1 then out <(— 2 
else id 

if do3i = 1 then id 
else id 

if do4i = 1 then drop 
else id 

action ou t || actionid || actiondmp 



Separating tables into three stages may seem excessive, but suppose 
r also modified the typ field. In this case, r 1 1 w is no longer 
well typed (because r writes to typ while w reads from it), but 
we may still extract concurrency from w;r: By splitting reading 
and writing into separate phases, the match stage for applying 
the access control policy (match™) can run concurrently with the 
match determining the output port (match r ) with little change from 
the example above. Concurrent processing like this is a key feature 
of both the RMT and FlexPipe architectures. 

RMT. The RMT chip provides a thirty- two table pipeline divided 
into ingress and egress stages, which are separated by a multicast 
stage. As a packet arrives, tables in the ingress pipeline act upon 
it before it reaches the multicast stage. To indicate that the packet 
should be duplicated, ingress tables mark a set of metadata fields 
corresponding to output ports on the switch. The multicast stage 
maintains a set of queues, one per output port. The chip enqueues 
a copy of the packet (really a copy of the packet's header and 
a pointer to the packet's body) into those queues selected by the 
metadata, optionally marking each copy with a distinct tag. Finally, 
tables in the egress pipeline process each copy of the packet. 

We model the multicast stage as the parallel composition of 
sequences of tests on header and metadata fields followed by the 
assignment of a unique value tag and an output port, where each 
summand corresponds to a queue in the RMT architecture. We 
model the ingress and egress pipelines as sequences of tables, 
where each of the thirty- two tables may be assigned to one pipeline 
or the other, but not both. The RMT architecture makes it possible 
to divide a single physical table into pieces and assign each piece 
to a different pipeline. We leave modeling this as future work. 



FlexPipe. While physical tables have built-in concurrency within 
match and action stages, the FlexPipe architecture also makes use 
of concurrency between physical tables. The ingress pipeline is ar- 
ranged in a diamond shape. Each point of the diamond is built from 
two tables in sequence, with incoming packets first processed by 
the first pair, then concurrently by the next two pairs, and finally 
by the last pair. This built-in concurrency is optimized for common 
networking tasks, such as checking packets against an access con- 
trol list while simultaneously calculating routing behavior — as in 
our firewall example of Figure 2. 

The FlexPipe architecture breaks multicast into two stages sep- 
arated by a single egress stage. The mirror stage makes up to four 
additional copies of the packet. Each copy sets a unique identifier to 
a metadata field m and writes to a bitmap out corresponding to the 
ports on which this copy will eventually be emitted — this allows for 
up to five potentially modified packets to be emitted from each port 
for each input packet. The egress stage matches on the metadata 
field m and various other fields to determine which modifications 
should be applied to the packet, and then applies those correspond- 
ing updates. Finally, the flood stage emits a copy of each mirrored 
packet on the ports set in its out bitmap. 

6. Compilation 

Compilation consists of several passes, each of which addresses a 
discrepancy between the expressivity of the high-level policy and 
the physical restrictions of the hardware model. In this section, we 
target the RMT architecture. 

• Multicast consolidation transforms a policy with arbitrary oc- 
currences of multicast (+) into a pipelined policy wherein mul- 
ticast occurs at just a single stage. 

• Field extraction moves modifications of a given field to an 
earlier stage of a pipelined policy. 

• Table fitting partitions a pipelined policy into a sequence of 
tables, possibly combining multiple policy fragments into a 
single table. 

Each pass takes a well-typed policy as input and produces an 
equivalent, refactored policy as well as a binding transformer as 
output. 
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Definition 2 (Binding transformer). A binding transformer 6 is an 
operator on table bindings b. 

0 e (Var Policy) ->> Var Policy 

Binding transformers play the role of the "generated rule trans- 
lator" from Figure 1. In other words, during the switch population 
phase, the controller will issue table bindings b — essentially, clos- 
ing substitutions, see Definition 1 — in terms of the original policy, 
pre-compilation. It is the binding transformer 0's job to transform 
these table bindings so that they can be applied sensibly to the post- 
compilation pipeline configured on the switch. 

6.1 Multicast consolidation 

There are two important differences between the kind of multicast 
that Concurrent NetCore offers and the kind supported by the 
RMT pipeline. First, multicast may not occur arbitrarily in the 
RMT pipeline; rather, there is a fixed multicast stage sandwiched 
between two pipelines. Second, the multicast stage must know 
the destination output port of each packet copy at the time the 
packet is copied. We use multicast consolidation to rewrite a high- 
level policy into a form with a distinct multicast stage. The next 
section describes how we use field extraction to extract potential 
modifications to a given field from a subpolicy — which we will use 
to isolate writes to the output port to the multicast stage. 

Informally, multicast consolidation works as follows. Suppose a 
policy p contains two instances of parallel composition, along with 
subpolicies q, r, and s that do not contain parallel composition. 

p — q + r + s 

Multicast consolidation rewrites p into two stages: the consolida- 
tion stage makes three copies of the packet and sets a fresh metadata 
field unique to each packet. 

Pc=/i<-1+/2<-1+/ 3 <-1 

Next, the egress stage replaces the original occurrences of multicast 
in p with a sequence of tests on the new metadata fields. 

p e — if /i = 1 then q else id; 
if /2 = 1 then r else id; 
if /3 = 1 then s else id 

The consolidation and egress stages are composed sequentially. By 
convention, fresh metadata fields are initialized to zero. Hence, 
p c \Pe acts equivalently to p, producing at most three packets: one 
processed by q, another by r, and a third by s. 

To capture this formally, we define syntactically restricted forms 
for the consolidation and egress stages that model consolidated 
packet duplication and tagging. The consolidation form is similar 
to the multicast stage presented in Figure 6 but slightly higher- 
level, in that it may contain table variables and additional field 
modifications — later compilation phases will factor these out. 

Definition 3 (Multicast consolidation stages). 

consolidation sequence s ::= Uif <— 1 \ (x : r);Hifi <(— 1 

consolidation stage rn £ M ::= a%\ s% 

egress stage n £ N ::= id | x : r | n; r 

n; if Uifi = 1 then r else id 

A consolidation stage is the sum of zero or more predicated 
consolidation sequences, each of which assigns to a set of fields 
(used for tagging each packet copy for later processing). We use 
the product notation liif <(— Vi to stand for a sequence of field 
modifications f <(— vi; . . . ;/ n <(— v n . Sequences may optionally 
begin with tables, which allows for multicast to be increased or 
decreased at run time. 

An egress stage consists of a sequence of smaller policies. The 
sequence may begin with a table, which allows the egress stage 



to grow or shrink at population time; otherwise, it begins with id. 
Each remaining subpolicy takes one of two forms. Either it is drawn 
from the fragment of CNC that does not contain multicast, which 
we represent with the metavariable r, or it may be a multicast-free 
fragment embedded within an if statement that tests some subset 
of the metadata fields set in the consolidation stage. Intuitively, r 
alone represents a part of the original policy to be applied to all 
multicast copies of the packet, whereas an if statement selectively 
applies the policy it wraps to some copies of the packet, leaving 
others untouched. 

Definition 4 (Multicast consolidation). 

pipeline :: (Var ->> Nat) ->> Policy (M x N x 6) 

Given an arbitrary policy p, the function pipeline s p factors the 
policy into a consolidation stage m followed by an egress stage n. 
The argument s is a user- supplied hint mapping each table to the 
number of copies it may make of a packet. The pipeline function 
is syntax-directed and presented in its entirety in the technical 
appendix; we highlight two interesting cases here. As one might 
expect, the bulk of the work takes place in the multicast case: 

pipeline s (p + q) = 

let / = a fresh metadata field in 

let mi),m, Oi — pipeline s p in 

let m j)i n 2, O2 — pipeline s q in 

let 713, O3 = qualify (/ = 0, m) in 

let 714, #4 = qualify (/ = 1, 712) in 

((Ei m i; / <~ °) + (Y.j m j'J <~ l),ra 3 ;n 4 , 

o 1 o6 2 o0 3 o e A ) 

Given a policy p + q, our strategy is as follows. First, recursively 
consolidate p and q. Then, pick a fresh field / that neither p nor q 
use. For each summand in the consolidation stage produced from p, 
set / to 0, and assign 1 to / in summands produced from q. Finally, 
predicate each egress pipeline from p with / = 0 and from q with 
/ = 1 — the qualify function transforms if a then n else id into 
an egress pipeline n with the predicate a conjoined to the guard 
in each subseqent if statement. Finally, note that by construction, 
6 functions extend the domain of table bindings to accommodate 
new table variables. Hence, we can simply compose the 6 functions 
produced by recursive compilation. 

Table variables are the other tricky case — we must use the 
s argument to see how much more multicast has been reserved, 
deferring some of the multicast consolidation to rewrites that will 
occur during the population phase. 

pipeline s (x : r) = 

let fs = s(x) fresh metadata fields in 

let tm =y : ({} Js) in 

let t n = z : (r.l U fs, r.2) in 

let 0' = (A6, w.\et m,n,9 = pipeline s (b x) in 

if w = y then m else if w = z then n else Tq b w) in 

(tm 1 tn , 

Applied to a table variable, the pipeline function produces a 6 
function that, in turn, compiles all future table updates — using the 
s map to preallocate metadata fields for future updates. A key 
property of table updates is that they produce closed terms — hence, 
invoking pipeline inside 6 on the updated table b x runs no risk of 
divergence. 

Example. As a brief example, let's look at how multicast consoli- 
dation will work on the r + m fragment of the example policy from 
Section 2. Recall that m contains a table variable — which may in- 
troduce more multicast later. The compiler relies on a hint, s, that 
pre-allocates metadata fields corresponding to the amount of mul- 
ticast that future updates may contain. Let fs = s x be a set of such 
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fields. The policy produced bypipelines(r + ra) will be 

(/^0 + y:({typ,src},/ S );/^l); 
(if / = 0 then r else id); 

(if / = 1 then z : ({typ, src} U fs, {out}) else id), 

and the binding transformer 6 will be 

(A6, w.\et g, r, 6' — pipeline s (T& x) in 

if w = y then q else if w = z then r else Tb w) . 

We introduce a fresh metadata field / to consolidate multicast in 
a single stage and tag each packet copy, and the remainder of the 
policy uses the tag to determine whether to apply r or m to each 
fragment. Because m contains a table variable x, we also add new 
tables y and z to handle any multicast that m may contain in the 
future — and we produce a function 6 to ensure this. 

Suppose an update arrives to x in as part of a table binding, b. 
Applying 6 b to the compiled policy will consolidate any multicast 
present in b and install appropriate policies in y and z. Since Tb x 
produces a closed policy, 6' is always the identity function. 

Proof of semantic preservation. Finally, we prove that the origi- 
nal policy is equivalent to the compiled policy for all table updates. 
We use z to model the fact that metadata is initially assigned a value 
of 0 when the packet arrives at the switch, and that metadata is not 
observable once the packet has left the switch. The proof proceeds 
by induction on the structure of the policy p. 

Lemma 6 (Multicast consolidation preserves semantics). Let fs 

be the metadata fields used to tag multicast packets, and let z — 
Tlf^fsf <(— 0. If h p : r and m,n,6 — pipeline s p, then 
T b (z; p- z) =To b (z; m; n; z). 

Proof. By induction on the structure of p, relying on Lemmas 4 and 
5 and the axioms of NetKAT [3] to establish equivalence. 

□ 

6.2 Field extraction 

The RMT architecture also requires that the output port of each 
packet be set during the multicast stage. Field extraction examines 
a policy to determine all the conditions under which a given field 
modification may take place, and then rewrites the policy so that 
modifications to that field happen first. For example, suppose we 
wish to extract modifications to the field / from this policy. 

if b then / <(— v\ ; p else / ^— v 2 ; q 

Either / is set to v± or v 2 , and the predicate b determines which 
occurs. Using a fresh field we can rewrite this policy. 

(&;/ <- vi;f <- 0 + -■&;/ <- v 2 ;f' <- 1); if/' = Othenpelseg 

Introducing /' is necessary because b may depend on the value of 
/. For example, suppose b is / = V3. The clause f — 0 in the if 
statement ensures that p is executed if / was set to v\ . 

We define a modification stage as a sum of all the conditions 
leading to a given field being modified, coupled with the modifica- 
tion. The function ext/ p splits a policy p into a modification stage 
for the field / followed by the remainder of the policy. 

Definition 5 (Modification stage). 

modification sequence s ::= Tlifi <— Vi 

(x : r);Uifi <- Vi 
modification sum eG E ::= ^j a j'i s j 

Definition 6 (Field extraction). 

ext/ :: Policy (Ex Policy x 6) 



The interesting case lies in extracting modification conditions from 
within an if statement. 

ext/ (if b then p else q) — 

let /' — a fresh metadata field in 

let (Z)i°ii; m iO>Pij^i = ext/ pin 
let CL2j]m 2 j),q2,0 2 = ext/ q in 

let e = Y,i Q>ii',™>ii',f' <~ 0 + 

^]a2j]m 2 j]f' <- 1 in 
(e, if y r = 0 then p± else q 2 , 0 2 o 61) 

In this case, we begin by recursively extracting any modifications 
from the branches of the if statement. We then sequence the pred- 
icate b with the conditions produced from the true branch and 
-^b with those from the false branch. However, modifications mu 
(from the recursive call ext/ p) or m 2 j (from the recursive call 
ext/ q) might affect the predicate b. We therefore save 6's pre- 
modification value in a fresh field / ; . After we've run the modi- 
fication sums from p and q, we produce a conditional that now tests 
/ ; , which holds the original result of the predicate b. 

As with multicast consolidation, we show that when metadata 
has been zeroed at the beginning and end of the policy, the inter- 
pretation of the original and compiled forms are equivalent for all 
table updates. 

Lemma 7 (Field extraction preserves semantics). Let fs be the 

metadata fields used to tag field extraction, and let z — II/ e / s / <(— 
0. If h r : r and e,r,6 — pipeline s r, then Tb (z;r;z) = 
T e h (z;e;r';z). 

Proof By induction on the structure of p, relying on Lemmas 4 and 
5 and the axioms of NetKAT [3] to establish equivalence. 

□ 

Composing multicast consolidation with field extraction (on 
the out field) produces two large summations. The next step is 
to factor the summations and group summands by output port. It 
is unclear whether/how the RMT architecture supports emitting 
multiple copies of a packet out the same output port, and so we 
reject programs of that shape here — we stick to a set semantics, 
though we can simulate a bag semantics with metadata fields. Now, 
valid policies consist of a single large summation of tests followed 
by modifications, ending with modification of the out field. 

^^Ujfij = Vij;U k fik Vik] out ^- i 

i 

A final transformation splits this summation into a sequence of 
three smaller summations, of which the middle aligns precisely 
with the multicast stage of the RMT pipeline. 

(Ei U jfij Vij',ouU <- 1); 

(Si ° ut i = Ij/tag <~ ^ OUt <- i); 
(Si /tag = «5 H-kfik ^" V ik ) 

We have not yet proved that this transformation is semantics pre- 
serving, although we expect that doing so is straightforward. The 
next section presents techniques for compiling these, and other 
policies, to physical table format. 

6.3 Table fitting 

At this stage of the compilation process, every occurrence of par- 
allel composition has been consolidated to a single multicast stage, 
appropriate for deployment to the RMT's multicast stage. What 
remains are table variables, predicates, field modifications, and if 
statements joined by sequential and concurrent combinators. Two 
tasks remain to match the policy with the architecture model. First, 
predicates, field modifications, and if statements must be replaced 
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by table variables. A binding transformer will reinstate these pol- 
icy fragments into the tables at population time. Second, the table 
variables in the user policy must be fitted to the table variables in 
the architecture model. 

Both steps depend on a second compilation algorithm to com- 
pile a table-free user policy to a single physical table. With a few 
small modifications, we can adapt the compilation algorithm de- 
scribed in [3] for compiling policies to a physical table format. We 
call this single table compilation. 

single_table :: Policy — > Policy 

The resulting policy fits the shape of the matches; crossbar; actions 
table format described in Figure 6. The extension to the algorithm 
described in [3] is straightforward, and we defer a complete pre- 
sentation to a technical report. 

Table insertion. At configuration time, the RMT switch consists 
solely of tables arranged via sequential and concurrent compo- 
sition. Non-table elements in the user policy are fixed — i.e., the 
topology of tables cannot at population time (when the switch is 
"running") — but they cannot be installed directly on the switch at 
configuration time. Rather, they must be replaced by table vari- 
ables, and then reinstalled at population time by a binding transfor- 
mation. As a small example, consider the policy typ = http; x : r. 
No matter which policies are installed into x at population time, 
they will always be preceded by the filter on the typ field. Hence, 
we can produce a new policy with a fresh table variable, y : 
({} 5 {typ}) 5 x r > an d a binding transformation 

(A6, w.\f w — y then typ — http else Tb w). 

Definition 7 (Table insertion). 

insert_table p; q — let p', 0 P = insert_table p in 



insert_table p\\q — 



insert_table p 



let q\ 0 q — insert_table q in 

(p';q',9 P °9 q ) 

\etp\0p = insert_table p in 

let q , 0 q — insert_table q in 

(p'\W,9p°9«) 
(x : r, (Xb, w. 

if w = x then single_table (T& p) 

else Tb w)) 



After completing this step, the transformed user policy consists 
of table variables and sequential and concurrent combinators. We 
don't define a case for parallel composition because all of the 
multicast has already been consolidated. 

Table fitting. Single-table compilation comes with a cost — the 
number of rules in the compiled table grows exponentially with 
the number of sequential combinators in the original policy. How- 
ever, thanks to the concurrency inherent within physical tables, 
the policy p \ \ q does not incur any overhead when installed in 
a single table. This leads to a choice. Suppose we have a policy 
(p; (q 1 1 r)) : r that we would like to compile to a sequence of two 
tables, (x\ : r); (x% '■ r). Recall that concurrency is commutative 
(Lemma 4) and equivalent to sequential composition (Lemma 5). 
Hence, there are four ways we might compile this policy. 

In the first case, p is compiled to x\ and q \ \ r to X2. The cost 
of p (written \p\) refers to the number of TCAM or SRAM rows 
the compiled policy fills. The cost of placing q and r in the same 
table is \q\ + \r\. In the next, the division is p; q and r, and here, the 
cost of placing p and q in the same table is multiplicative in their 
sizes. Similarly, p; r might be placed in x\ and q'mxi at the cost 
\p\ * \r\ + \q\. Finally, the RMT chip has the capability to join its 
physical stages together to emulate a single, larger "logical stage." 
That capability provides a final option, which is to compile p, q, 
and r to a single table (paying the largest overhead of \p\ * \ q\ * |r|). 
If p, q, and r are of equal size, then the first option is most efficient. 



But when p is small and x\ has space remaining, it may make sense 
to pay the cost of compiling p; q or p; r to x\. The RMT "logical 
table" feature is suitable for cases in which p, q, or r are too large to 
fit in a single physical table. The RMT chip has a limited number 
of bits a table can match and the number of rules it can hold — each 
match stage stage has sixteen blocks of 40b by 2048 entry TCAM 
memory and eight 80b by 1024 entry SRAM blocks — so deciding 
how to partition a policy into tables matters. 

Since there are many choices about how to fit a collection 
of tables, we have defined a dynamic programming algorithm to 
search for the best one. The goal of the algorithm is to fit a well- 
typed policy, without parallel composition, into as few tables as 
possible. 

Definition 8 (TCAM cost measurement). 

table_cost £ 

height x — 

height p\q — 

height p 1 1 q — 
blocks p 



Var N 

table_cost x 

height p * height q 

height p + height q 

[(width p)/40] * [(height p)/2048] 



As input, the algorithm relies on a user- supplied annotation 
predicting the maximum size of each user table at population time, 
written table_cost x. We also rely on several utility functions. The 
width of a policy (width p) returns the number of bits it matches, 
while the height (height p) uses the user-supplied annotation to 
gauge the number of entries that will ever be installed into the 
policy at population time. Together, they calculate the number 
of TCAM blocks necessary to implement a policy (blocks p). 
Similar measurements exist for compiling to SRAM, but we focus 
on TCAM here. 

As input, the algorithm also takes a policy containing only 
sequences of tables. The policy AST is a tree with combinators at 
the nodes and tables at the leaves. We need to flatten this tree into 
the RMT pipeline. To do so, we must consider different groupings 
of the tree's fringe. For convenience, let tij represent an in-order 
numbering of the leaves of the abstract syntax tree, starting with 
tn as the leftmost leaf. For example, given a policy (x : r x );(y : 
r y )\ (z : r z ), then t 2 3 would be (y : r y )\ (z : r z ). 



input : A sequence of t\ n 
input : table_cost 

1 let m[l ... n, 1 ... n] and s[l . . . n — 1, 2 . . . n] be new 
tables; 

2 for i = 1 to n do 

3 \ m[iji] = [(blocks U)/16] ; 

4 end 

5 for I — 2 to n do 



6 
7 
8 
9 
10 



for i = Hon — / + 1 do 

3 =i + l- l; 

m[i,j] — oo; 
for k — i to j — 1 do 

q 

min(m[z, k] + m[k + 1, j], [blocks tij/ 16}): 
if q < m[i, j] then 
m[i,j] q\ 
s[i,j] = k\ 

end 



n 

12 
13 
14 

15 end 

16 end 

17 end 

18 return m and s 



Algorithm 1: Table fitting. 
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The algorithm proceeds by building a table m, where each cell 
m[ij j] holds the smallest number of tables into which the sequence 
Uj can fit. The crux of the algorithm lies on line 10. Given a 
sequence Uj for which the optimal fit for each subsequence has 
been computed, either the entire sequence may be compiled to a 
single logical table that can be deployed across [blocks Uj /16] 
physical tables, or there exists a partitioning Uk,tkj where both 
subsequences fit into sets of tables, and so the entire sequence fits 
into the sum of the size of the sets. The algorithm contains three 
nested loops iterating over ti n , giving it a complexity on the order 
of G(n 3 ), where n is size of the policy AST's fringe. The table s 
records the best partition chosen at each step, from which we can 
reconstruct the sets of subsequences to compile to each table. 

It remains to convert a user policy with concurrent and sequen- 
tial composition to one without concurrent composition. We apply 
a brute-force approach. For each concurrent operator p\\q, produce 
two sequences, p; q and q; p. Apply Algorithm 1 to each, and select 
the smallest result. There are on the order of 0(2 m ) sequences, 
where m is the number of concurrency operators, and so this final 
determinization step runs in 0(2 m n 3 ). Fortunately, m our experi- 
ence, policies tend to have on the order of tens of tables, although 
the tables themselves may hold many more rules. 

7. Related work 

NetCore [7, 11, 12] is a simple compositional language for specify- 
ing static data plane forwarding policy. NetKAT [3] extended Net- 
Core with Kleene star, and a sound and complete equational theory 
for reasoning about networks. Concurrent NetCore shares a com- 
mon core with NetCore (and NetKAT), but adds table specifica- 
tions, concurrency, and a type system. These additions necessitate 
a new approach to the semantics — the denotational techniques used 
for NetCore and NetKAT do not extend easily to models of concur- 
rency. Moreover, these new features make it possible to express 
controller requirements as well as next generation switch hardware 
features. We have focused on specifying the properties of individ- 
ual switches here, so Kleene star is unnecessary, but it would be 
interesting to investigate adding it in the future to facilitate reason- 
ing about networks of multi-table switches. 

Concurrent Kleene Algebra (CKA) [8] is a related calculus that 
latter offers four composition operators: sequential composition, 
alternation, disjoint parallel composition and fine-grained concur- 
rent composition. One key difference between NetCore/KAT and 
CKA (as well as other interpretations of Kleene algebra we are 
aware of) is that NetCore interprets "alternation" (disjunction) in 
a non-standard way as "copying parallel" composition. This leads 
to new and interesting interactions with our concurrent composi- 
tion, which is most similar to CKAs disjoint parallel composition. 
Concurrent NetCore also has a type system and interpretation spe- 
cialized to network programming, while CKA is presented at an 
extremely high level of abstraction. 

Bossart et al. [4] recently proposed an architecture for pro- 
gramming OpenFlow 2.0 switches, which we follow in this pa- 
per. Bossart's configuration language includes components for pro- 
gramming the packet parser as well as the match-action packet pro- 
cessing. We focus on just the match-action processing here, but 
provide a formal semantics and metatheoretic analysis of our work, 
whereas they provide no semantics. We also consider concurrent 
and parallel composition, which they do not. Another important in- 
spiration is the ONF's ongoing work on typed table patterns [2]. 

8. Conclusion 

Concurrent NetCore offers at once (a) a language for specifying 
routing policies and (b) packet-processing pipelines. It's novel op- 
erational semantics and type system recover strong reasoning prin- 



ciples. As such, it is an excellent intermediate language for compil- 
ing routing policies — since CNC can express both high-level poli- 
cies and low-level pipelines, a multipass compiler can use the same 
reasoning principles throughout. 
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